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SPARSE APPROXIMATIONS OF PROTEIN STRUCTURE FROM 
NOISY RANDOM PROJECTIONS^ 

By Victor M. Panaretos and Kjell Konis 

Ecole Polytechnique Federale de Lausanne 

Single-particle electron microscopy is a modern technique that 
biophysicists employ to learn the structure of proteins. It yields data 
that consist of noisy random projections of the protein structure 
in random directions, with the added complication that the pro- 
jection angles cannot be observed. In order to reconstruct a three- 
dimensional model, the projection directions need to be estimated by 
use of an ad-hoc starting estimate of the unknown particle. In this 
paper we propose a methodology that does not rely on knowledge of 
the projection angles, to construct an objective data-dependent low- 
resolution approximation of the unknown structure that can serve 
as such a starting estimate. The approach assumes that the pro- 
tein admits a suitable sparse representation, and employs discrete 
L^-regularization (LASSO) as well as notions from shape theory to 
tackle the peculiar challenges involved in the associated inverse prob- 
lem. We illustrate the approach by application to the reconstruction 
of an E. coli protein component called the Klenow fragment 

1. Introduction. The structure of biological macromolecules is at the 
heart of the quest to understand life in purely physical terms, and thus is 
fundamental to any biophysical project. A key element in solving the struc- 
ture of a protein is to be able to visualize the protein in three dimensions, 
both in terms of exterior shape as well as of interior variations. This is a chal- 
lenging task given the microscopic scale of the structures we wish to access, 
which can be less than a nanometer wide. The mechanisms which enable us 
to gain structural information will typically provide indirect knowledge (pos- 
ing inverse problems)^ which will have to be translated into initial structural 
terms in a mathematically sound way. Such mechanisms include X-ray crys- 
tallography and electron microscopy, among others [Drenth (1999), Glaeser 
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Fig. 1. A transmission electron microscope at the MRC Laboratory of Molecular Biology, 
Cambridge, UK. 

et al. (2007)]. The electron microscope (Figure 1) in particular, is a pow- 
erful tool that possesses important advantages over its "competitors," such 
as high scattering power and the retainment of phase information [Chiu 
(1993), Henderson (2004)]. It allows the retrieval of sufficiently detailed 
three-dimensional representations to be able to deduce atomic-level repre- 
sentations of the macromolecules of interest: the chemical structure of the 
particle of interest (i.e., the amino acid sequence) can be fitted (docked) into 
the density map produced via electron microscopy to obtain the complete 
three-dimensional folding of the particle [Glaeser et al. (2007)]. 

At the most basic level, the structure of a protein is determined by the 
spatial configuration of its constituent atoms. The negatively charged elec- 
trons of each atom create a field of electric potential surrounding the atom, 
and when combined, these electric potentials create a density of potential 
p(x^y^z) in space (that can be thought of as a probability density func- 
tion). This is called the shielded Coulomb potential density^ and we usually 
think of a particle as being one and the same as its potential density (it is 
this density that we seek in order to dock the amino acid sequence of the 
particle and completely understand its structure). When placed under an 
electron beam, the potential p causes a reduction to the beam intensity due 
to electron scattering. If the beam is in the z-direction, then the Abbe image 
formation theory [Glaeser et al. (2007)] stipulates that the reduced intensity 
recorded is approximately given by 




p{x, y, z) dz + noise. 
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Essentially, knowledge of the optical density recorded on the film corresponds 
to knowledge of the two-dimensional marginal density of p in the z-direction, 
except for some minor optical effects such as astigmatism, defocus, etc. If 
we were able to obtain multiple such measurements on the same particle 
from various beam directions, then determination of the three-dimensional 
density would amount to the solution of a noisy tomography problem with 
random projection angles. This type of problem is well understood and has 
been extensively studied in the statistical literature, both methodologically 
[e.g., Vardi, Shepp and Kaufman (1985); Silverman et al. (1990); Green 
(1990); Jones and Silverman (1989)] and theoretically [e.g., Johnstone and 
Silverman (1990), O'Sullivan (1995)], principally in its positron emission 
tomography (PET) version. 

It is impossible, however, to image the same particle under many an- 
gles because extended exposure to the electron beam will cause chemical 
bonds of the particle to break, and thus will alter the structure of the spec- 
imen. A means to surpass this difficulty is to crystallize many identical 
particles, and thus distribute the electron dose over multiple occurrences of 
the same structure, but the crystallization process is usually cumbersome, 
time-consuming, and unpredictably varying for different types of particles 
[Glaeser (1999)]. 

Single particle cry o- electron microscopy is a technique of electron mi- 
croscopy that avoids the process of crystallization [e.g., Glaeser (1999)], 
and, as such, it is increasingly popular as a structure determination tool in 
structural biology. The idea is to image a large number of unconstrained 
particles in solution. The particles rotate and diffuse freely in solution, and 
are then rapidly vitrified, having assumed various different random orienta- 
tions. After preliminary processing, the data yielded are essentially a number 
of noisy versions of the projected potential densities, at orientations both 
random and unknown. 

Traditional tomographic techniques break down in this setting, as these 
crucially depend on the knowledge of the projection angles. In order to be 
able to put these techniques to use, biophysicists attempt to estimate the 
unobservable projection angles [Frank (1999)]. To this aim, they typically 
assume a completely specified low resolution form for the unknown density. 
This model often relies on knowledge on the structure of the particle gained 
either from other experiments or from an ad hoc examination of the projec- 
tions by eye. In some cases, this model can be derived from the data using 
the so-called projection- slice theorem [Deans (1993)] for Radon transforms, 
but the success of this approach will depend on the level of noise in an image. 
Once such a prior is given, the unknown angles are considered as param- 
eters to be estimated. When a set of angles is estimated, they are used in 
order to obtain a traditional tomographic reconstruction [Natterer (2001), 
Deans (1993)], and update the starting model [Frank (1999), Glaeser et al. 
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(2007)]. The procedure is then iterated until it stabihzes. Broad classes of 
such refinement methods include the so-called projection matching method 
[Penczek, Grassucci and Frank (1994)] and 3D Radon transform method 
[Rademacher (1994)]. In the first approach, the prior model is projected over 
a wide range of directions, obtaining so-called re-projections. Each data pro- 
jection is then cross-correlated with each re-projection, and is assigned an 
angle corresponding to the angle of that re-projection which produced the 
highest cross-correlation. The 3D Radon approach is essentially equivalent, 
the only difference being that it focuses on the Radon transform rather than 
on the X-ray transform (which are of course very closely related) . Variations 
of these approaches also exist that try to "integrate out" the angles rather 
than estimate them: treating them as unobservable random variables (miss- 
ing data) and using an approach based on the EM algorithm (EM here 
standing for Expectation-Maximization), initialized again by some prior 
model for the structure of the particle [Sigworth (1998), Bern, Chen and 
Wong (2005)]. Indeed, this latter approach draws interesting parallels to the 
methodology of Vardi, Shepp and Kaufman (1985) in the case of positron 
emission tomography. As already mentioned, though, what is common to 
any of these strategies is that they require a completely specified initial esti- 
mate for the structure. In cases where previous structural information is not 
available, the level of noise is relatively high, and a naked eye examination 
is either infeasible (e.g., when the particle has no symmetries) or would best 
be avoided, it is natural to seek approaches to obtaining "objective" initial 
models directly from the data, in order to then initialize approaches such as 
those mentioned above. 

The purpose of this paper is to develop statistical tools that will enable 
the construction of a data-dependent starting model in the noisy setting en- 
countered in practice. If the starting model is to depend only on the data at 
hand, its construction will have to bypass the unknown angles, thus requiring 
the approximate solution of a tomographic problem that has a second layer 
of ill-posedness. Nevertheless, it was seen in Panaretos (2009) that a con- 
sistent formal estimator for the shape of the particle may be constructed. 
However, the problem of the actual construction of an estimate in a prac- 
tical situation still remained open, as the formal estimators introduced in 
Panaretos (2009) are only implicitly defined. Their construction requires 
the solution of further inverse problems, with severe instabilities due to the 
presence of noise, and the approximate nature of the modeling framework. 
In this paper we propose a framework for implementing estimators such 
as those proposed in Panaretos (2009) under sparsity constraints. Our ap- 
proach combines L^-regularization using Least Angles Regression with the 
special geometry of the sample space to yield a procedure applicable to 
actual electron microscope data. We illustrate the approach through an ar- 
tificial example and also by application to noisy single particle projections 
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of the so-called Klenow fragment^ a large protein fragment that is produced 
during DNA polymerase interactions in E. coli. The paper is structured as 
follows. Section 2 provides a statistical formulation of the problem, and some 
relevant background. Section 3 introduces the modeling framework in which 
sparsity is to be understood. Our approach is presented in Section 4 and 
illustrated on an artificial sparse density. Finally, the method is applied to 
single Klenow particles, and an initial sparse approximation of the structure 
is obtained in Section 6. Some concluding remarks are made in Section 7. 

2. Statistical formulation. From the statistical perspective, the problem 
can be phrased as follows. We wish to recover a compactly supported prob- 
ability density function p(x), x= {xi^X2^xs)^ G M^, given noisy discrete 
images of N random projections, 

/+CX) 
p{Unyi)dx'i, n= 1,...,7V, 
-oo 

where {Un}n=i ^ collection of i.i.d. random rotation matrices distributed 
according to normalized Haar measure on 50(3), the group of rotations 
in R^, that is, 

UjUn^I a.s., det((7^) = 1 a.s. 

and 

WUn = UnV VH^, V G 50(3). 

The N discrete noisy profile images {Pn}n=i obtained by sampling the 
projections {pn}n=i ^ regular T x T lattice, subject to corruption by 
additive noise, 

Pn{iJ) = pn{xi,yj) + en{i,j), ij = 1, . . . ,r. 
It will be assumed that the noise arrays are independent, white and Gaus- 
sian, en{i^j^) ^'-^ ' A/'(0, cr^). The (more or less) standard problem of tomog- 
raphy would be described by 

(2.1) Recover p(x) given {(P„ t/,)}^^i. 

However, in the single particle setup, the rotations {Un} are unobservable, 
leading to the perturbed problem 

(2.2) Recover p(x) given {Pn}n=i' 

The difference between these two problems is fundamental. Every established 
technique for solving problem 2.1 (e.g., based on singular value decompo- 
sition, likelihood, smoothed backprojection and Fourier methods) crucially 
depends on the knowledge of the projection directions {Un}- In the absence 
of these directions, the estimation problem is not even well defined: it is easy 
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to see that the density p(x) is unidentifiable, since any rotated/reflected ver- 
sion p((5x), with Q^Q = /, win generate data with the same distributional 
properties. Intuitively, this means that one cannot recover an exact coordi- 
nate system for the density. Although this is conceptually obvious, it can 
be a serious hurdle to statistical estimation: for example, if one wishes to 
parametrize the unknown density using a Fourier expansion, the Fourier 
coefficients will not be invariant to changes of the coordinate system. 

Nevertheless, the shape of the density p can potentially be recovered 
[Panaretos (2009)]. The shape of p, denoted [p], encodes the totality of char- 
acteristics of p that are invariant with respect to the coordinate system 

[p] = {p(U^):Ue 0(3)}, 

where we denote the group of orthogonal transformations in by 0{d). 
Furthermore, it was seen in the same paper that the shape of the projec- 
tion pn^ [pn] = {Pn{U:K.) : U G 0(2)}, constitutes a sufficient statistic for [p]. 
Hence, identifiability combined with the sufficiency principle would lead one 
to consider estimating [p] on the basis of estimators depending on the data 
solely through their shape characteristics [pi], . . . , [pn]- 

Unfortunately, any likelihood-type approach turns out to be completely 
intractable in this setup. However, the feasibility of extracting an estimator 
from the random projections, without any recourse to the angular com- 
ponent, suggests that one might consider techniques that yield inefficient 
estimators that are nevertheless "efficient enough" to serve as a starting 
model for an iterative procedure that estimates angles, conducts traditional 
tomography, and iterates until the reconstruction stabilizes. Less formally, 
one can set to obtain a rough initial approximation that nevertheless cap- 
tures the essential features of the object that are required to obtain a first 
set of angular estimates. In the next section we formulate these approxima- 
tions through a class of sparse radial mixtures. These provide, on the one 
hand, a means to fruitfully parametrize the notion of shape, and, on the 
other hand, a natural way to impose sparsity. 

3. Sparse approximations by radial mixtures. The key to our approach is 
the realization that approximating the unknown density by a relatively sim- 
ple object suffices, if the aim is to obtain a starting reconstruction. Indeed, 
ad-hoc starting models used by biophysicists often consist of collections of 
solid spheres. The class of approximate models that we shall be pursuing is 
that of radial mixtures, 

K 

p(x) = ^gfc0(x-/x^), K>1, {/xJcM^ gfc>0, 

(3.1) 

K 
i=l 
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with (/)(•) a radial probabihty density function on (e.g., an isotropic Gaus- 
sian density), that is, 

(3.2) </,(y) = </,(;7y) VyGM3,[/GO(3). 

Radial mixtures comprise a flexible yet tractable class of models for density 
estimation [see, e.g., Hastie, Tibshirani and Friedman (2001), Chapter 6, 
Section 7]. The choice of this class is especially well suited to this problem, 
as it offers two technical advantages: 

(1) Good behavior under rotation and projection: the rotated version of p 
according to U ^ S0(3) is given by 

K K 

k=l k=l 

that is, by a radial mixture of the same densities with the same mixing 
coefficients, but centered at the rotated location parameters {Uij,j^}. The 
projected density at orientation U will then be given by 

/+00 ^ r-\-oo 

((7p)(xi,X2,X3)dX3 = ^g/e / (^{yi - U ^j^) dx^ 
-oo k=l ^ —oo 

K 

^^(lk^{H{^-Uiij,)), 

k=i 

where H is the identity matrix with its last row deleted, and (f is the (unique) 
two-dimensional marginal of 0, 



H 



1 
1 



/ + CX) 
-oo 



(2) The possibility of a finite-dimensional parametrization of the shapes 
of p and of a projection p using the Gram matrix of the original and projected 
location parameters, respectively, 

[p] = (Gram({/xJ),{gJ), [p] = (Gram({i/t//x J), {g^}), 

where for a collection of K vectors {wj}f=i^ Grann({wj}) represents the 
symmetric nonnegative matrix with (i, j)-element equal to (w^,Wj). 

In Kendall's Shape Theory, Gram matrices are employed as a coordi- 
nate system for the shape manifold induced by rigid motions [Kendall et al. 
(1999), Kendah and Le (2009)]. Note that if the vectors {^j}f=i are ar- 
ranged as the columns of a 3 x K matrix W, then we may simply write 
Grann(VF) = VF^VF. This Gram matrix encodes all the invariant character- 
istics with respect to 0(3) of the configuration {wj}, since it is invariant 
under orthogonal transformations of the generating vectors: for 5 E 0(3) we 
immediately see that Gram(SH^) = W^B^BW = W^W = Gram(H^). Fur- 
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thermore, given a Gram matrix of rank one can find K vectors in R^, 
d> with centroid zero whose pairwise inner products are given by that 
Gram matrix (in fact, the specification of such an ensemble amounts to 
merely solving nondegenerate lower triangular linear systems of equations). 
Therefore, for a given density (f) (or projected density (/p), the Gram matri- 
ces coupled with the corresponding mixing proportions comprise a complete 
description of the shapes of the original and projected densities, respectively. 

A further importance of this parametrization is that it provides an in- 
terface with the finite-dimensional case, where projected shape is better 
understood [Panaretos (2006, 2008), Le and Barden (2010)]. In particular, 
it allows use of the following simple connection between projected shape and 
original shape: 

Theorem 3.1 [Panaretos (2009)]. Let {^k}k=i ^ configuration of K 
vectors in and let U be a random element of S0(3) satisfying WU = 
UV = U, for any W,V e S0(3) . Then 

E[Gr3^^{{HU^^^k}tl)] = iGram({w,}f^i), 
where H is the 3x3 identity matrix with its last row deleted. 

Based on this result, Panaretos (2009) proved, for known K and under 
the assumption that for i j we have qi ^ qj , that the hybrid maximum 
likelihood/method of moments estimator p(x), 

K 

(3.3) /3(x) = qk(j){^ - fik) 

k=i 

is consistent modulo 0(3), as the resolution of each image T x T and the 
number of projections N grows. Here, fij^ is any collection of K vectors in 
with Gram matrix 

o ^ 

n=l 

The {qk} and {HUn/J^k} maximum likelihood estimators of the common 
mixing proportions and the individual projected location parameters for 
each profile, stemming from the loglikelihood 

ii{qk},{HUn^lk}) 

(3.4) 

^ N T T r K 

°^~iVT2 JZ] Pn{i,3)-^(1W{H{^-Uniik)) 

n=l 1=1 j=l K k=l 

The latter loglikelihood stems from the independence between projections 
and between pixels, and the Gaussian assumption on the noise. Notice that 
each of the vectors HUn/J^k treated as a separate parameter. 
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4. Reconstruction of a nearly black protein. Although the latter devel- 
opment provides a consistent solution to the problem from a theoretical 
perspective, it does not provide a solution to the practical problem. The 
estimator defined formally as p cannot be readily constructed given a data 
set of projections, as it is implicitly defined through the likelihood equa- 
tion (3.4). The optimization of the objective function given by the latter 
equation is a separate challenge of its own — not in terms of computational 
tractability, but in terms of accuracy and stability. Among the reasons for 
this is the dimension of the search space (which is 2KN + K) . This can 
to some extent be mitigated, if one is to obtain separate likelihood esti- 
mates within each projection image, breaking the overall problem into N 
independent problems, each with search space dimension ?>K (and then seek 
a global estimate for the mixing proportions). But, more importantly, it is 
the highly nonlinear form of the objective function (3.4) and the possibility 
of parameters being almost unidentifiable (when projected means fall close 
to one another) that presents the most serious complications in a practical 
reconstruction. The objective function admits multiple local optima that are 
in addition unstable to minor perturbations of the noise term (the search 
surface has multiple relatively flat peaks). This instability of the nonlinear 
likelihood function is a manifestation of an inherent ill-posedness, which is 
clearly revealed once we re-express the problem as a collection of decon- 
volution problems to be solved given discrete data: 



Here, 5 denotes Dirac's delta function. In this format, the problem is seen 
to be a linear inverse problem in the unknown function /i(x) := ^k=i Qk^{'^ — 
HUntJ^k)- The solution of such a problem would require regularization through 
the imposition of some norm penalty on the function h that we wish to re- 
cover. This cannot work here because the unknown function to be recovered 
is a Dirac comb — which is not an element of and hence does not allow 
Hilbert space regularization methods. This is simply a different way of saying 
that the problem cannot be treated as a linear one: if we are interested in the 
locations themselves (the spikes), the problem is fundamentally nonlinear. 

Our basic idea to tackle this problem is to turn the drawback of "singular- 
ity" into an advantage by transforming the nonlinear problem into a linear 
problem through discretization of the solution search space. While the func- 
tion we seek to recover is not well behaved when considering it as defined 
over a continuous domain, it reduces to a very simple object once thought of 
as a high-dimensional vector. This simplicity is reflected through sparseness. 



{{HUnt^i, • • • . HUn^iK)i (<?i, • • • , Qk)} 
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In particular, suppose that we relax our search problem and ask to recover 
the image pixels that contain spikes, rather than the precise spike locations 
themselves. Choose a projection n — n^) and omit the index for simplicity. 
Then, the problem can be approximately expressed via the following linear 
equation: 

(4.1) Vt'^xI = '^T2xT2/5t2x1 +^T2xl- 

Here, V is the vectorized image, obtained by stacking the columns of the 
image matrix P. The matrix X is constructed as follows: the pih column of X 
is a vectorized (by column) image (which we call a base profile) generated 
by placing a single density at the center of the jth pixel. More precisely, 
let Uj be the center of the jth pixel. Then, the pth column of X is given by 
the vector 

where j runs so that the pixels are arranged in column-major order. The 
parameter vector /3 is a x 1 vector containing at most K nonzero entries: 

Wh<K. 

These entries reveal which pixels contain spikes. Since the entire density 
must be contained within the image boundaries, we a priori fix entries of /3 
that correspond to pixels near the boundary to be zero (or, alternatively, 
drop the corresponding columns from the matrix X). Finally, e is an i.i.d. 
Gaussian mean-zero error vector. 

Thus, in this discrete form, the problem has been reduced to a model se- 
lection problem in linear regression: we wish to recover the nonzero entries 
of /3, which will reveal the approximate spike locations. The key observation, 
of course, is that /3 is sparse: we expect that K <^T^ ^ so that we are at- 
tempting to recover a nearly black object in the terminology of Donoho et al. 
(1992). It therefore seems quite appropriate to employ a shrinkage estimator 
in this setting. There are various possibilities, but it is the LASSO [Tibshi- 
rani (1996)] that arises as the most natural one in the setting of this problem 
[see also Hastie, Tibshirani and Friedman (2001)]. Specifically, observe that 
the nonzero entries of /3 should be equal to the mixing proportions corre- 
sponding to the respective spikes (in case multiple spikes fall within the same 
pixel, then it would be the sum of the corresponding mixing proportions). 
Since the object in question is a probability density, we must have 

t2 k 

1=1 1=1 

and we are naturally led to the following L^-constrained least squares prob- 
lem in /?: 

(4.2) minll-p - XI3\\l subject to ||/3||i = 1. 
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Since the specifications of the problem determined the precise value for 
the Li penalty, there is even no need to perform cross-validation to determine 
the bandwidth parameter. In practice, of course, the total mass m of the 
density will not be precisely known, and may slightly differ from projection 
to projection. However, an approximate value m can be easily estimated. 
Therefore, one can employ the Least Angle Regression (LARS) algorithm 
[Efron et al. (2004)] to compute the whole LASSO path, and calibrate the 
results around a small neighborhood of the bandwidth corresponding to the 
approximate mass m. 

In order to illustrate the details (and effectiveness) of this discrete regu- 
larization approach, we revisit an artificial example presented in Panaretos 
(2009), where a three-dimensional mixture of four Gaussian kernels was to be 
recovered given its projections at randomly chosen directions. The pseudo- 
particle potential density in three dimensions was given by 

^'-'^ ^^"^ ^ ^ — r 

where u = {u^, Uy, eR^, a = 0A6, gi = 0.18, ^2 = 0.26, ^3 = 0.21, ^4 = 0.35, 
and with {/x^} given by /x^ = (0, 0.8, -0.3)^, = (0.7, -0.4, -0.3)^, /Xg^ 
(—0.7, —0.4, —0.3)^, /X4 = (0, 0, 0.8)^. The corresponding signal-to-noise level 
for the projections (understood as the ratio of the signal to the noise vari- 
ance) was at the level of 61 : 1. 

The method employed in Panaretos (2009) to perform the deconvolutions 
required for the construction of the estimator was a direct spectral approach 
based on results on Toeplitz forms [Grenander and Szego (1958), Pisarenko 
(1973)]. The approach performed well on noiseless projections, but would fail 
completely even with very small amounts of noise. This effect is easily antic- 
ipated as the Toeplitz form approach amounts to an approximate discrete 
version of deconvolution by unregularized inversion of Fourier coefficients — 
which is bound to be highly unstable in the presence of noise. 

In order to produce a reconstruction in the presence of noise, we imple- 
ment the LASSO deconvolution approach on the basis of = 150 noisy 
random discrete profiles. The typical profile (Figure 2) is given by a dis- 
cretized image V = {P(z, j)} defined as 

K 

(4.4) P{iJ) = '^qk(p{uij\jlk,(T'^) + £{ij), z, j = 1, . . . , 64, 

k=i 

where e{i^j) are i.i.d. Gaussian with mean and standard deviation 10""^; 
(/p(-|z/, cr^) is a spherical bivariate Gaussian density with mean v and vari- 
ance cr^; Uij is the center of the (z, j)th image pixel; {ilk}k=i locations 
of the 4 (unobservable) projected means in that profile: 
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Fig. 2. Three random noisy profiles of the mixture (4-S). The digits indicate the true 
locations of the projected component means. 



Since each image contains a region that is known a priori to be "empty" (i.e., 
does not contain a projected mean), we hmit our interest to pixels in the 
complement of that region. Let ^ be the set of indices p G {1, . . . , T^} such 
that the pixel centers satisfy \\urp\\ <w (Figure 3). We call the elements 
of {up'.p E ^} candidate means and build the convolution matrix X as 

The choice of the tuning parameter w is made so as to ensure that the base 
profiles integrate to (approximately) one. In the specific example, the choice 
It; = 7r/3 is seen to be sufficient. 

We used the LARS algorithm (in particular, the lars function in the lars 
package [Hastie and Efron (2011)] for the R Project for Statistical Comput- 
ing [R Development Core Team (2011)]) in order to compute the complete 




Fig. 3. Illustration of the restriction of the support for a discrete profile, (a) A discrete 
profile with T = 8. The pixel centers Up are denoted by gray dots, (b) The set of candidate 
means for the convolution matrix X . 
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Fig. 4. The same three random profiles, with black dots indicating the locations of the 
nonzero LASSO coefficient estimates. The digits indicate the true locations of the projected 
component means. 



regularization path (in t) for the LASSO problem 
(4.5) minll-p - XPWl subject to 



and retained the parameter estimates /3 provided for t shghtly less than the 
estimated mass m (to avoid overfitting) . The latter was estimated by the 
average total intensity of the projections. Figure 4 depicts three character- 
istic noisy random profiles, along with the pixels the LASSO picked out as 
candidate locations for mean parameters. We denote the centers of these 
pixels as {up}p^^^ where 

Since the true locations of the projected means will almost certainly not 
be contained in A^, and since the discrete representation of the convolution 
will only be approximate, a fit with precisely the right number K of nonzero 
parameters {K = 4 in this case) can rarely be achieved (i.e., |^| 7^ K). How- 
ever, the nonzero parameters found by the LASSO will tend to bracket the 
locations of the projected means, as can be noticed in Figure 4. It therefore 
suffices to use a naive clustering rule to associate nonzero LASSO parameter 
estimates with projected means: if two pixels selected by the LASSO share 
either an edge or a corner, then they belong to the same cluster. 

Let {^k}k=i denote the K clusters comprising ^, so that = yf^i 
where [+) denotes a disjoint union. Then, the estimates of the locations of the 
projected means are computed by taking a weighted average of the locations 
of the nonzero lasso parameter estimates in each cluster using the parameter 
estimates as the relative weights. 



^pe^k ^P 
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Further, the sum of the LASSO parameter estimates in each cluster pro- 
vides an initial estimate of the mixing weight associated with that cluster. 
Once all of the mixing weights have been initially estimated, the final esti- 
mates are achieved by scaling the initial mixing weights so that they sum 
to m, the estimated total mass of the particle, 

Qk := — - — ^^m. 

Note here that this is the estimate of the mixing proportions stemming 
from a single profile (these will later be combined to produce a global esti- 
mate). To mitigate any bias in the mixing coefficients estimates introduced 
by choosing a constraint parameter less than 1, one may allow the constraint 
parameter to increase so long as the number of clusters remains constant. 
That is, the constraint parameter is increased either until it is equal to 1 or 
until any further increase would spawn a new cluster. Often, this results in 
one or more additional nonzero lasso parameter estimates joining the current 
clusters. 

Once a set of estimated mixing proportions and mean locations has been 
obtained for each projection, these are used in order to construct the hybrid 
estimator (3.3). An intermediate step required is building the estimated 
Gram matrices for each projection consistently. That is, we should ensure 
to the extent possible that estimated location parameters that correspond 
to the same three-dimensional mean should share the same index. For this 
reason, within each profile, the estimated location and mixing parameters 
are relabeled according to the ascending ordering of their mixing proportions 
(which were assumed to be distinct); that is, the indices are assigned so that 

qi<q2<"' <qK- 

Once the labels have been consistently assigned to the location estimates 
within each profile, one may obtain a single profile likelihood estimate for the 
mixing proportions, by solving the ordinary least squares problem obtained 
when plugging the estimated location parameters into the loglikelihood (3.4). 

Finally, the estimated Gram matrices and the single set of estimated 
mixing proportions are used to construct the hybrid estimator (3.3). Figure 5 
shows the original pseudo-particle in contrast with the estimated version. 
The reconstruction was based on 53 of the 150 profiles in our sample, for 
which four clusters were more or less clearly identifiable. In the majority of 
these profiles, the 4 clusters correspond to the component means. However, 
from time-to-time, one of the clusters was a false positive. In these cases, 
the smallest mixing weight was far smaller than typical for the sample. To 
further filter these profiles out, we rejected profiles with left-outlying mixing 
proportions (left outlying values were omitted when calculating the mixing 
weights in Table 1). 



RECONSTRUCTION OF SPARSE PROTEINS 



15 




(b) 

Fig. 5. The actual pseudo-particle density and the estimated density, (a) A level sur- 
face of the true pyramid density from 3 different vantage points, (b) Corresponding level 
surfaces of the estimated pyramid density. 



The estimated Gram matrix and corresponding estimated location param- 
eters are contrasted below with the true values (the estimated locations of 
components in 3 dimensions jl can be computed by solving a simple system 
of linear equations) : 
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(Mi M2 f^3 M4) 



(Al A2 A3 A4) 




0.825 
0.000 
0.000 



0.834 
0.000 
0.000 



-0.275 
0.806 
0.000 



-0.211 
0.784 
0.000 



-0.335 
-0.405 
0.678 



-0.275 
-0.409 
0.695 



0.275 
0.397 
0.695 



0.289 
0.380 
0.678 




Interestingly enough, the reconstruction procedure was not severely af- 
fected by higher levels of noise contamination. Even when the noise variance 
was increased by two orders of magnitude, leading to a 1 : 1 signal-to-noise 
ratio, the reconstructed version of the particle was not significantly per- 
turbed (see Figure 6). Since it is the deconvolution step that is the most 
ill-posed aspect of our approach, this can be largely attributed to a note- 
worthy degree of stability exhibited by the LASSO as a means for deconvo- 
lution. 

5. More on the geometry of the problem. The implementation of the 
LASSO based hybrid estimator to the almost black pseudo-particle of the 
previous section brings to the surface two potential issues that might arise 
when implementing the procedure to actual proteins (as will be done in 
Section 6). We consider these in the next two paragraphs. 

5.1. Using fewer projections. The first point relates to the usability of 
all the profiles available. It was seen that several projections were not used 
because the viewing angles that they represented caused problems in the con- 
struction of the estimator. Nevertheless, the estimator constructed seemed 
to be rather successful. It is therefore natural to wonder if one could do 
with far fewer projections. This will become especially relevant in practical 
situations where a number of projections might not present well-identifiable 
mixture means. The answer is in the affirmative, that is, one can typically 
use a very small number of projections, as is illustrated by the next lemma. 

Lemma 5.1. Let Hi,H2,Hs be projection matrices of rank 2 acting 
on and {/j^i, . . . , fij^) be an ensemble of K vectors in R^. // the ranges of 
I — Hi^I — H2, I — Hs are pairwise orthogonal, then 



Proof. Since the rank of the projection matrices involved is 2, we may 




i=l 




i = l,2,3. 
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Sm-. 61 SNR: 10 SKH: 1 

(ffS = io-«) (ff* s= 6.1 « 10-*) (^7^ = 6.1 X 10-^) 




Fig. 6. Reconstructions under different signal-to-noise scenarios. Each column corre- 
sponds to a different noise level. The first row presents a typical profile along with the 
candidate mean positions obtained via the LASSO. The second row presents the same 
profiles without any noise, and the corresponding candidate mean positions obtained via 
the LASSO. The last two rows present two different viewpoints of the final reconstruction 
obtained. 
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Furthermore, since the images of / — Hi, I — H2, and / — are pairwise 
orthogonal, it must also be that {ei}^^^ be pairwise orthogonal, thus con- 
stituting an orthonormal basis for M^. Letting V denote the 3 x K matrix 
with (fii, . . . , /x^) as its columns, it follows that 

Gram({i?,M Jf=i) = E hJ H,V = ( ^ ^ 

i=l i=l \i=l / 

= 3V^V - V^ieieJ + eaej + egej)!^ = 2V^V 
= 2Gram({Aijf=i). □ 

Lemma 5.1 allows us to heuristically reinterpret the estimator G of the 
Gram component given by 

o 1 ^ 

n=l 

by thinking of it as grouping the data into triads of nearly orthogonal 
views, forming an estimator within each triad using Lemma 5.1, and then 
averaging these estimators. 

It follows that, in principle, only a few random projections at unknown 
angles suffice to reconstruct a Gram matrix — provided that we can arrange 
them in groups that represent views that carry information from relatively 
different viewpoints. In practice, one cannot know whether projection angles 
are indeed orthogonal, since they are unknown. However, one can try to 
identify classes of profiles that appear to be carrying significantly different 
profile information, and use these as a proxy. The procedure is illustrated in 
the next section. 



5.2. Consistent construction of Gram matrices. The second issue that 
became apparent from the pseudo-particle example has to do with the con- 
sistent construction of the Gram matrices across different profiles. This con- 
struction hinges on the assumption that the mixing proportions are distinct. 
The formula defining the pseudo-particle guaranteed that the mixing pro- 
portions were indeed distinct, and allowed us to successfully construct the 
estimator. In practice, it is natural to expect situations where the mixing 
proportions for certain components are not significantly different, leading to 
instabilities in the construction of the estimated Gram matrices. To address 
this problem, we can take advantage of the special geometry of the problem 
and, in particular, the fact that the projections of a radial basis function can- 
not lie in a totally arbitrary subspace of the set of 2D radial basis functions: 
the locus of projections is highly constrained, a fact that may be exploited in 
order to assign mixing proportions in a way that attempts to respect these 
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Fig. 7. The Roman surface from three different vantage points. 



constraints. The constraints on the radial basis functions induce correspond- 
ing constraints on the support of the projected Gram matrices, forcing this 
support to depend cruciahy on the three-dimensional original Gram matrix 
(i.e., we are dealing with a nonregular problem). Specifically, a projected 
Gram matrix cannot be any arbitrary nonnegative definite symmetric ma- 
trix. The locus of possible projected Gram matrices ^ comprises a smooth 
surface in M^^^ of (intrinsic) dimension 2. The idea is therefore that an ar- 
bitrary permutation of the entries of a projected Gram matrix induces an 
abrupt change in its location relative to ^, typically mapping it far from ^ . 
In principle, we should thus be able to choose an arrangement of the en- 
tries of a projected Gram matrix so as to make it "closest" to the locus of 
"allowed" Gram matrices. 

To be more precise, if V is any 3 x A: matrix such that G — V^V ^ then the 
projected Gram matrix at direction given by e = (ei, 62, 63)^ G is defined 
as 

G(e) = 1/^(7 -ee^)y. 

As e ranges over the unit sphere, the matrix / — ee^ ranges over the real 
projective space (the sphere with antipodal points identified). This real pro- 
jective space can be visualized in three dimensions as the Roman surface (see 
Figure 7), using the mapping 3 (61,62,63)^ ^^ (6263,6163,6162)^ [Apery 
(1987)]. The effect of pre-multiplying by and post-multiplying by V 
is to stretch this Roman surface according to the singular values of V ^ 
rotate it by its left singular vectors and finally shift it [much like a full 
column rank d x n matrix transforms the sphere into an (n — 1)- 

dimensional ellipsoid in R^]. This can be seen directly by using Kronecker 
products: 

vec{G(e)} = Yec{V^{I - ee^)V} = {V^ ® V^) vec{(/ - ee^)}. 

In practice, the estimated Gram matrices will not lie precisely on the locus W 
since their construction is subject to error. However, we expect that they 
should lie close to this surface. Therefore, given a Gram matrix that is de- 
termined up to a permutation of its entries, one can select the arrangement 
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of its entries so as to minimize its distance from the underlying locus ^ . Of 
course, the exact locus ^ will not be known in practice, as it is in bijec- 
tive correspondence with the unknown three-dimensional Gram matrix G. 
However, an initial estimate of the surface ^ can be constructed using those 
projected Gram matrices for which correspondences are known. The proce- 
dure is illustrated in the next section, where we construct a sparse initial 
model for the potential density of a real biological particle. 

6. Application: Sparse approximation of a Klenow fragment. We now 

turn to demonstrate our approach on noisy projections of an actual biolog- 
ical particle called the Klenow fragment. The Klenow fragment is a large 
protein fragment that is produced in E. coli when DNA polymerase reacts 
with certain enzymes [Klenow and Henningsen (1970)]. The data set we 
will consider consists of 250 noisy projections of the actual known structure 
of the particle, produced in silico, mimicking the behavior of the electron 
microscope, and kindly provided by Professor Andres Leschziner, Harvard 
University [for a detailed description of the data generation methodology, 
see Leschziner and Nogales (2006), Sections 2.1 and 2.2]. A sample of twelve 
of these projections is depicted in Figure 8. The projection signal-to-noise 
ratio is at the level of 3 : 1. 









m 














i 





Fig. 8. A sample of twelve projections from the Klenow fragment data set. 
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6.1. Identifiability and blind deconvolution. A brief visual inspection of 
these projections should make it immediately clear that, unlike the synthetic 
particle example treated earlier, the Klenow fragment does not fit precisely 
within the sparse radial mixture framework. However, it is also apparent 
that if it is a coarse first order approximation that we are interested in, then 
the sparse radial model is quite reasonable. Nevertheless, the approximate 
nature of this representation will have certain implications: 

(1) The isotropic density function on which the radial representation is 
based, is unknown. In essence, this means that the deconvolution problem at 
hand is a blind deconvolution problem, as the point spread function itself is 
poorly determined. Fortunately, we will see that the discrete deconvolution 
approach based on the LASSO remains successful even when the convolution 
matrix is approximate. 

(2) It is likely that only a subset of the projections will be usable, because 
several of the projections may involve projected means that lie close to one 
another, hence pushing to the limit of unidentifiability. 

(3) The mixing proportions corresponding to the best fitting radial rep- 
resentation have no guarantee of being well separated. Therefore, we will 
need to make use of the special geometry of the problem, as the estimated 
mixing weights will not be sufficient for labeling the components. 

We begin by applying the LASSO deconvolution procedure to each of the 
250 profiles. Since the isotropic density for the expansion is unknown, we 
employ a Gaussian kernel using a = 0.224 — a value chosen experimentally 
(and which will later be refined). Interestingly, we observed that employing 
different kernels (or even different scale parameters) did not significantly 
influence the results, provided that a was not too large. Even though the 
point spread function was more or less arbitrarily selected, the LASSO de- 
convolution procedure produced highly sensible output (some examples are 
shown in Figure 9), providing evidence to the effect that the procedure is 
relatively robust to perturbations of the point spread function, provided that 
it remains isotropic and relatively concentrated. 

Since several profiles fell into the "almost unidentifiable" regime, we se- 
lected three classes of profiles where the location parameters seemed well 
identified and that comprised relatively different viewpoints of the particle. 
The classes were constructed by choosing a generating profile and then se- 
lecting additional profiles that appeared to be reflections or rotations of the 
generating profile. Class 1 consisted of profiles where the LASSO deconvo- 
lution procedure identified 6 component means. Classes 2 and 3 consisted of 
profiles where the LASSO deconvolution procedure identified, respectively, 
5 and 4 component means. The three classes are shown in Figure 9. Our ex- 
perience showed that only very few particles are actually required to obtain 
a good reconstruction and so we limited class membership to four particles 
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Fig. 9. Three classes of 4 profiles. The labeling of the components was obtained by taking 
the mixing weights in descending order, (a) Class 1: profiles with 6 identifiable projected 
component means, (b) Class 2: profiles with 5 identifiable projected component means. 
(c) Class 3: profiles with 4 identifiable projected component means. 



per class (in the pseudo-particle example, we observed that as few as a dozen 
could be used to produce an excellent reconstruction). 

The next steps require determining the correct labeling of the compo- 
nents within each class relative to the generating profile, and consistently 
combining the Gram matrix estimates from each class to obtain an overall 
estimate of the Gram matrix. To describe these steps, we use the nota- 
tion /i^^ "^^ and q^^ to denote, respectively, the estimated mean and mixing 
weight of the fcth component in the jth profile of class i. Additionally, we 
use /i^^'-^^ to denote the matrix with columns "^^ q^^'^"^ to denote the 

^(i-j) 

vector with elements , /c = 1, . . . , K^. 

6.2. Labeling the projected component means within a class of profiles. 
Since each class consists of profiles that are assumed to be approximately 
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Fig. 10. The target profile is roughly a reflection of the reference profile. The initial 
labeling of the components in the target profile was obtained from the estimated mixing 
weights and does not agree with the reference profile. However, the alignment algorithm 
finds the correct correspondences, (a) Reference; (b) target: before; (c) target: after. 

rotations or reflections (plus some small perturbation) of the generating 
profile, the Gram matrix generated by any member of the class should be 
close to the Gram matrix generated by the generating profile when the 
corresponding components have the same labels. This suggests the follow- 
ing Procrustean algorithm for determining the correspondences between the 
projected component means in a candidate profile and those in the generat- 
ing profile: 

(1) Make a list of all possible labelings of the components in the candidate 
profile. 

(2) For each labeling /, compute the quantity di = \\Gr — Gi\\f where Gr 
is the Gram matrix generated by the reference profile, Gi is the Gram matrix 
generated by the candidate profile with labeling / and || • ||i? is the Frobenius 
matrix norm. 

(3) Choose the labehng corresponding to the smallest d/. 

An example is shown in Figure 10. The correspondences within each class 
can now be obtained by applying these steps, in turn, to each candidate 
profile in the class. 

6.3. Estimating the Gram matrix. We begin by using Theorem 3.1 to 
produce an initial estimate of the Gram matrix for Class 1: 



It should be noted that while each individual Gram matrix within this 
class encodes an ensemble that is intrinsically two-dimensional (i.e., has 
rank 2), the Gram matrix obtained by the averaging procedure does not nec- 



(6.1) 
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essarily encode an ensemble that can be imbedded into a two-dimensional 
plane (i.e., the averaged matrix has rank higher than 2). This provides some 
intuition on the workings of the inversion procedure: if all the projections 
within a class were identical, the average would be exactly of rank 2, so that 
the averaging provides no three-dimensional information. However, none of 
the class members represent precisely the same orientation. These minor 
perturbations provide some three-dimensional information, even though not 
dramatic: the resulting matrix might no longer be of rank 2, but its third 
singular value will be relatively small- the three-dimensional ensemble it en- 
codes is almost two dimensional. The intuition is that when Gram matrices 
from further classes are added in (representing significantly different ori- 
entations), the ensemble generated by the averaged Gram matrix becomes 
"more three-dimensional." 

In fact, there is no guarantee that a Gram matrix formed by averaging 
several rank-2 Gram matrices will have rank 3: the rank may actually end 
up being higher. For this reason, we further make a rank 3 approximation 
of Gi using its singular value decomposition. Let 

Gi = UiDiV^ 

be the singular value decomposition of Gi and define 

where U[ and V{ are, respectively, the first 3 columns of Ui and Vi and D' 
is a diagonal matrix containing the first 3 singular values of Gi . 

In class 2, we have K2 — Ki — 1 — hence, we assume that one of the 
identified means in class 2 has multiplicity 2 (i.e., we assume there are in 
fact 6 components in the true density and that the projections of two of them 
fall sufficiently close in the profiles in class 2 that the LASSO deconvolution 
approach identifies them as a single component). Also, we note that the 
largest component is sufficiently distinct that it can be used reliably to 
identify the first component. 

Following from these assumptions and in order to obtain a 6 x 6 Gram 
matrix "compatible" with Gi, we consider ensembles of means of the form 

for A: = 1, . . . , 5 and j = 1, . . . , 4. The idea is to use the geometrical properties 
introduced in the previous section, to choose that candidate Gram matrix 
which lies closest to the locus of projected Gram matrices. The latter is 
unknown, but we may approximate it by the locus generated by Gi, which 
constitutes itself an estimator of the unknown three-dimensional Gram ma- 
trix. Let V — {Pi} for / G >C be the set (with index /) of all 6 x 6 permutation 
matrices that leave the first row unchanged. We then build the set of can- 
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Fig. 11. (a) Sampled points on the Roman surface generated by Gi. (b) Point cloud of 
all possible permuted Gram matrices for class 2, relative to the Roman surface generated 
by Gi . The red point corresponds to the point of minimum distance. 



didate Gram matrices with elements 

(6.2) Gl, = ^^Y.^r^n^{[f,('■^ 

for ah combinations of / G £ and /c G {1, . . . , 5}. 

Our measure of affinity to the locus generated by Gi is the Euclidean 
distance between the candidate Gram matrix and this locus: the stretched 
Roman surface generated by Gi. In practice, this distance is computed 
by randomly sampling a set of points on the Roman surface, then tak- 
ing the minimum distance between each of these points and the candi- 
date Gram matrix (Figure 11). Of course, the induced distribution on the 
Roman surface will no longer be uniform, but it is not necessary that it 
be. All that is required is a relatively good coverage of the surface. We 
use a sample of 1,000 points on the perturbed Roman surface, defined 
as 

Sn = Vl{I - Unul)V{^, n = 1, . . . , 1,000, 

with {un} being unit random vectors and (^i)^^i = Gi. We then define the 
distance 



d{Gik) = mm\\Gik S, 



n\\F- 



Taking / and k such that d{Gjj^) is minimum, we proceed to compute an 
initial estimate of the Gram matrix from the profiles in classes 1 and 2 as 



(6.3) 6., = ^ 



■ 4 4 
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(a) (b) (c) 

Fig. 12. Estimated density for the Klenow fragment, (a) View point 1; (b) view point 2; 
(c) view point 3. 

The final estimate G12 is obtained by making a rank 3 approximation of G12 
using the singular value decomposition. 

Finally, we proceed analogously for the remaining class. In class 3, we have 
= Ki —2 = 4, hence, we assume that either one of the identified means 
has multiplicity 3, or two identified means have multiplicity 2. Again, to 
obtain a 6 x 6 Gram matrix, we consider ensembles of the form 

\fP-^^ A^) f^^\ 

where fci < A:2 G {1, . . . , 4}. The overall Gram matrix estimate is again com- 
puted by generating a set of candidate Gram matrices and taking the con- 
figuration that yields the smallest distance to the stretched Roman surface 
generated by G12, then by updating the Gram matrix estimate as above. 

As a by-product of estimating the Gram matrix, we now know where to 
place each of the 6 component means in any given profile, and which com- 
ponent means correspond to which from profile to profile. Consequently, we 
may estimate the mixing weights and the tuning parameter cr^ using linear 
regression. Given a candidate value for cr^, we can construct N convolution 
matrices {Af^ cr^j^^i (corresponding to the N profiles) as described in Sec- 
tion 4. We thus obtain N linear regression problems, one for each projection. 
By stacking the corresponding convolution matrices into a single N'^ x 6 ma- 
trix, we obtain a single regression for the 6 common mixing weights, and 
estimate the latter by ordinary least squares. The procedure can be per- 
formed for different choices of cr^ on a prespecified grid, retaining the set of 
mixing weight estimates that correspond to the regression with the best fit. 
The estimate for thus obtained for the Klenow fragment was — 0.0571. 

The sparse reconstruction produced by employing the estimated Gram 
matrix and mixing coefficients is depicted in Figure 12. In order to appre- 
ciate the "fit" of the sparse reconstruction to the data, we construct noisy 
projections from the reconstruction and contrast them to several typical pro- 
jections of the Klenow fragment. The projections of the reconstructed model 
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are constructed so as to mimic the effects that the microscope induces on 
the profiles (astigmatism, defocus, contrast transfer function effects) and so 
as to be characterized by a signal-to-noise ratio similar to that of the actual 
projections. A sample of such contrasts is given in Figure 13. We observe 
that, even though rather sparse, the reconstructed density is able to capture 
the main features of the projections quite successfully. This hints that the 
reconstructed density can be appropriate to use as a starting model. What is 
especially important is that the data produced by our reconstruction seems 
to be highly consistent with the actual Klenow data even for viewing angles 
that were not used in the reconstruction; in fact, this remains true even for 
viewing angles that fall in the unidentifiability regime. Figure 14 shows the 
corresponding residual deviation heat-maps. We observe that underestima- 
tion (corresponding to yellow/orange regions) occurs in the regions between 
the components of the Gaussian mixture — evidently, there is mass there that 
cannot be captured by the Gaussian mixture. There are also some regions 
where overestimation occurs (darker green regions), mostly close to the cen- 
ter of blob-like components of the particle profiles, principally due to the 
fact that the mixing components of the Gaussian mixture will obviously 
have relatively different higher-order concentration characteristics from the 
blob-like components of the actual particle. For example, in Figure 14(f), 
we observe slight overestimation of the density at locations corresponding to 
the center of the blob-like components of the actual profile. These two are 
the only systematic patterns that appear in the residuals, and are evidently 
attributed to the bias introduced from our regularization via the Gaussian 
mixture model employed. 

7. Concluding remarks. Despite the severely ill-posed nature of noisy 
random tomography, this paper demonstrates that it is practically feasi- 
ble to obtain useful three-dimensional structural information on a protein 
given only noisy projections at random and unknown angles. The approach 
proposed to this effect rests on two basic elements: the imposition of a cer- 
tain degree of sparsity on the required reconstruction, and the exploitation 
of the special geometry that is intrinsic to tomography data and provides 
valuable information. Though the sparsity assumption will typically lead to 
a relatively coarse-grained approximation to the protein under investigation, 
this is precisely what is required: a low-resolution starting model that can 
be used as a reference structure to iteratively recover the unknown angles 
to then produce a high-resolution reconstruction based on traditional non- 
parametric tomographic techniques (once the projection angles have been 
estimated, it is no longer necessary to maintain the mixture model). 

While it had previously been theoretically demonstrated in Panaretos 
(2009) that it is feasible to reconstruct a three-dimensional object in this 
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Fig. 13. Ten pairs of projections. Each pair contains an actual Klenow fragment projec- 
tion (left) coupled with a projection from our sparse approximation (right). 
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(d) (e) (f) 




(g) (h) (i) 

Fig. 14. Heat-maps of residual deviations of the fitted projections from the actual pro- 
jections. Each of the suhfigures (a)-(i) is generated from the corresponding pairs (a)-(i) 
in Figure 13. 

setting (up to an orthogonal transformation), obtaining an explicit recon- 
struction in practice remained elusive. By employing a radial basis repre- 
sentation of the unknown protein, the problem of structure determination 
was reduced to the problem of recovering the Euclidean shape of the ensem- 
ble of location parameters of the radial functions and the associated mixing 
coefficients. This was done in two steps: nonlinear deconvolution and shape 
averaging. In the deconvolution step, the projected location parameters had 
to be identified within the noisy projections. Since the nature of the radial 
expansion representation is approximate, the deconvolution problem was 
blind. 
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To tackle this problem, our approach relaxed the nonlinear deconvolution 
problem into a linear problem by considering its discretized version with an 
approximately chosen point spread function. When seen in this setting, the 
problem falls precisely in the framework of sparse model selection. Since the 
object to be recovered is a density function, the LASSO arose as the most 
natural technique to attack the problem, with an penalty corresponding 
to a requirement on the total mass of the density to be recovered. Despite 
the fact that the exact point spread function was unknown, it was seen 
that the LASSO performed extremely satisfactorily where other solution 
approaches break down. This was true both in the setting of artificial ex- 
amples, as well as in the setting of protein data. Once the projections of the 
location parameters had been deconvolved, the averaging step was carried 
out. This required the recovery of the correspondences between location pa- 
rameters in different projections. To this effect, our approach exploited the 
nonregularity of the tomography problem: it was seen that the Gram ma- 
trices of the k X k projected components are constrained to lie in a smooth 
two-dimensional subset of M^^^, which was identified as a deformed Roman 
surface. This was then exploited in order to choose consistent correspon- 
dences across projections. 

The methodology was applied with success both to projection data arising 
from an artificial example, as well as to projections of an actual protein 
component, the Klenow fragment. Especially in the latter case, it was seen 
that the sparse reconstruction recovered from the noisy projection data can 
very well serve as a starting model, since its typical projections are highly 
similar with those of the projection of the true structure (Figure 13). It is 
therefore likely that our approach will provide a useful means to obtaining 
objective data-dependent starting models in the context of single particle 
electron microscopy. From the purely statistical perspective, the use of the 
LASSO in the setting of double blind deconvolution can be of independent 
interest when seen in the context of estimation of mixtures of scale-location 
densities of an unknown family. 
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